Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

نویسنده

  • Keqin Li
چکیده

Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N ), where 2 < 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(logN) time by using N = logN processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1 p N = logN , multiplying twoN N matrices can be performed by a DMPC with p processors in O(N =p) time, i.e., linear speedup and cost optimality can be achieved in the range [1::N = logN ]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non-standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and Scalable Parallel Matrix

We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on linear arrays with reconngurable pipelined optical bus systems. These problems include computing the N th power, the inverse, the characteristic polynomial, the determinant, the rank, and an LU-and a QR-factorization of a matrix, and solving linear systems of equations. These c...

متن کامل

Fast and Scalable Parallel Matrix Computations with Optical Buses

We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on linear arrays with recon gurable pipelined optical bus systems. These problems include computing the Nth power, the inverse, the characteristic polynomial, the determinant, the rank, and an LUand a QR-factorization of a matrix, and solving linear systems of equations. These com...

متن کامل

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers

This paper describes the Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PIJhlMA package includes not only the non-transposed matrix multiplication routine C = A . B. but also transposed multiplication routines C = AT . B, C = A . BT, and C = AT . BT, for a block scattered data distribution. The routines perform efficiently for a wide ...

متن کامل

Fast matrix multiplication techniques based on the Adleman-Lipton model

Abstract. On distributed memory electronic computers, the implementation and association of fast parallel matrix multiplication algorithms has yielded astounding results and insights. In this discourse, we use the tools of molecular biology to demonstrate the theoretical encoding of Strassen’s fast matrix multiplication algorithm with DNA based on an n-moduli set in the residue number system, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000